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Abstract 


Because  of  the  latitude  allowed  by  the  Revised  ALGOL  68  Report, 
each  implementation  has  a  slightly  different  representation  for  the 
constructs  of  the  language.   This  diversity  can  only  lead  to  confuse 
as  ALGOL  68  trained  individuals  find  they  need  readaptation  to  programs 
at  a  new  installation.   The  solution  proposed  here  is  to  develop  a  single 
hardware  representation  which  can  be  used  on  many  computer  systems.   In 
fact  this  representation  can  conveniently  be  designed  using  only  the 
intersection  of  the  graphic  characters  available  in  the  ISO  code  and 
EBCDIC. 

The  paper  also  proposes  comfortable  new  representations  for  a  few 
symbols  and  discusses  the  thorny  problem  of  distinguishing  bold  face  words, 
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Preface 


I  didn't  want  to  write  this  paper. 

After  the  Los  Angeles  meeting  of  WG2.1  approved  the  Revised  Eeport  on 
the  Algorithmic  Language  ALGOL  68  (l)  I  decided  to  spend  two  leisurely  days 
designing  a  transportation  representation  for  the  language.   It  wasn't 
that  easy  (hut  I  did  it,  it's  another  paper).   Because  the  transportation 
representation  is  an  encoding  of  ALGOL  68  program  texts,  an  adequate 
explanation  requires  sample  encoder  and  decoder  programs.   These  must 
assume  some  specific  hardware  representation.   Lacking  a  compiler  I 
decided  I  could  quickly  design  a  suitable  hardware  presentation  language 
of  my  own. 

Older  now,  but  wiser,  I  offer  the  following. 

Sections  three  and  four  explain  the  hardware  representation  and  sections 
one  and  two  explain  why  I  chose  it.   To  a  large  extent,  the  sections  can  be 
read  independently  and  the  recommendations  of  one  adopted  without  adopting 
any  other.   In  particular,  the  transportation  representation  in  no  way 
depends  on  this  hardware  representation. 
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I.  Considerations  in  the  Design  of  Hardware  Representations 

Now  that  the  Revised  Report  has  been  approved  by  WG2.1,  it  is  appropriate 
to  reconsider  the  question  of  hardware  representations  of  the  language. 
(Hereafter,  the  Revised  Report  will  be  referred  to  as  the  Report;  if  there 
were  any  references  to  the  earlier  Report,  they  would  specify  the  Original 
Report.   Most  remarks  apply  to  both,  anyway.)   The  Report  has  been  written 
with  the  thought  that  it  will  be  implemented  on  a  wide  variety  of  hardware 
with  vastly  differing  character  sets.  As  a  consequence,  it  is  not 
particularly  specific  as  to  how  any  construct  will  be  represented  on  (say) 
cards. 

Rather  than  envision  the  possibility  of  a  tower  of  Babel  of 
representations,  I  suggest  that  there  is  in  fact  a  widely  available 
set  of  graphics  with  which  all  constructs  of  the  language  can  be 
represented.   Selection  of  a  widely  available  character  set  will  bring 
these  benefits : 

o  as  trained  AlfiOL  68  programmers  move  from  implementation  to 
implementation,  they  will  be  able  to  begin  programming  and  reading  programs 
immediately  and  without  confusion. 

o  as  programs  are  sent  from  one  installation  to  another,  they  will  be 
understood  without  lengthy  explanation  and  constant  referral  by  the  reader 
between  the  text  and  a  codebook. 

o  many  installations  have  a  variety  of  devices  with  different  character 
sets.   Only  by  choice  of  a  widely  available  representation  will  programmers 
be  able  to  access  the  files  containing  their  programs  from  all  these  devices, 
including  both  terminals  and  line  printers. 

o  ALGOL  68  will  more  readily  be  accepted  by  outsiders  as  a  single 
language  and  not  a  collection  of  similar  languages. 

It  must  be  recognized  that  at  the  majority  of  installations,  and 
especially  in  North  America,  ALGOL  68  will  not  be  the  primary  language  in 
use.  For  this  reason  it  cannot  be  expected  that  the  operating  system  will 
have  provisions  especially  suited  to  the  language.   In  particular,  for  many 
years  ALGOL  68  by  itself  will  not  be  a  strong  force  determining  the  nature 
of  character  sets  provided  as  standard  by  manufacturers. 

The  many  considerations  that  affect  design  of  a  representation  can  be 
categorized  into  psychology,  ALGOL  68,  and  hardware.   These  topics  are 
covered  in  the  remainder  of  section  I.   Section  II  suggests  a  few 
representations  for  symbols  that  may  be  controversial.   It  is  important  to 
note  that  the  representation  finally  arrived  at  can  be  adjusted  to  take 
into  account  rejection  of  any  of  these  suggestions.   Section  III  details 
the  decisions  made  in  the  rest  of  the  representation  and  section  IV 
discusses  the  sticky  question  of  representations  for  bold-tags. 

I.l  Psychological  considerations. 

Care  must  be  exercised  in  the  design  of  representations  for  a  number  of 
general  psychological  reasons : 


a)  There  are  many  possible  sources  of  small  confusions  in  representation 
design:   odd.  characters,  context  dependent  usage,  dissimilar  usage  in  similar 
contexts  in  similar  languages,  breaks  in  typing  rhythms,  and  more.   Each 
instance  of  confusion  may  be  only  a  minor  annoyance,  or  it  may  interrupt  a 
train  of  thought  and  result  in  omission  of  critical  phrases.  Moreover, 
confusions  may  have  a  cummulative  effect  that  can  lead  to  frustration  and 
breakdown  in  communication. 

b)  To  some  extent  the  physical  representation  and  not  the  abstract 
'strict  language'  is  the  medium  of  thought.   This  is  true  when  writing  a 
program  and  even  more  so  when  reading  a  program.   Consequently,  variations 
between  physical  representations  ought  to  be  carefully  restricted. 

c)  When  writing  a  program,  a  trained  programmer  writes  by  reflex  and 
concentrates  instead  on  the  task  at  hand.   For  small  changes  of  representation 
when  changing  installations,  the  retraining  period  is  probably  small,  but 

if  it  can  be  avoided,  everyone  benefits. 

d)  Representations  should  be  chosen  with  an  eye  to  the  representations 
used  in  other  fields  and  in  other  languages.   The  task  of  attracting  ALGOL  68 
users  is  sufficiently  difficult  without  repelling  them  with  unusual  symbol 
choices.   (The  Report  has  excellent  symbol  choices,  I  worry  about  implementations 

Beyond  these  general  considerations,  the  designer  must  keep  in  mind  a 
number  of  human  factors  effects : 

a)  Some  reasonably  consistent  aesthetic  should  be  followed  in  the  design 
to  assist  in  readability.   The  aesthetic  of  the  ALGOL  68  reference  language  is  a 
pleasant  combination  of  natural  language  and  mathematical  conventions.   It  seems 
characterized  by  economy  of  expression  and  avoidance  of  clutter. 

b)  Simultaneous  with  aesthetics,  the  designer  must  strive  for  understand- 
ability,  unmistakability,  and  clarity. 

c)  A  specific  problem  is  that  operators  that  bind  closely  ought  to  take 
less  space  than  those  that  bind  more  loosely.   In  this  regard,  a  bold  tag  used 
as  an  operator  can  suffer  "stropping  separation".   'This  includes  not  only  the 
length  of  the  tag,  but  also  the  character (s)  needed  to  strop  it  and  any 
necessary  blanks. 

d)  Another  factor  is  the  length  of  the  text.   Too  short  a  text  may 
correspond  to  a  program  that  has  been  abbreviated  beyond  reason;  but  too  long 
a  text  slows  both  the  writer  and  the  reader.  A  longer  length  for  an 
infrequently  used  operator  is  acceptable  because  it  will  not  substantially      j 
affect  the  length  of  the  program.  .  \ 

e)  No  one  is  aided  if  an  implementation  provides  too  many  alternative 
ways  of  expressing  a  single  construct.   Programmers  are  constantly  forced  to    [ 
choose  among  alternatives;  readers  must  be  prepared  to  encounter  that  many 

more  symbols.   When  text  includes  rare  alternative  form,  the  reader  may        , 
remember  it  erroneously;  he  will  at  least  have  to  interrupt  his  reading  to 
try  to  recall  the  symbol. 


Selection  of  specific  symbols  must  depend  on  yet  other  factors: 

a)  The  existence  of  groups  of  potential  programmers  trained  in  the 
meaning  of  a  symbol.   For  example,  '+'  ought  to  mean  addition  because  most 
potential  users  have  studied  algebra. 

b)  Use  of  the  symbol  in  other  widespread  languages.   Conflicting  usage 
could  be  a  source  of  confusion. 

c)  Relation  of  a  symbol  to  its  meaning  ("graphic  onomatopoeia").   For 
example,  parentheses,  braces,  and  brackets  do  appear  to  surround  their 
contents. 

d)  The  possibility  of  a  confusing  similarity  between  two  graphics. 
Lindsey  points  out,  for  example,  that  '+'  and  '«•'  appear  alike  on  teletypes. 

1.2.   Decisions  demanded  by  the  Eeport. 

In  a  number  of  areas,  the  Report  leaves  considerable  latitude  to 
accommodate  implementations  with  varying  character  sets.   The  representation 
must  specify  what  is  allowed  for  other-string-item,  STYLE -other-PRAGMENT-item, 
style-TALLY-letter-ABC,  and  style -TALLY -monad  (R9.J4-a}.  At  least  one  means 
must  be  provided  to  write  any  operator  in  the  standard  prelude  {R9.^b}, 
decisions  must  be  made  as  to  the  values  of  abs,  repr,  'null  character', 
'error  char',  'flip',  'flop',  'blank',  and  'max  abs  char'  [RIO. 2.1},  and  a 
conversion  must  be  associated  with  'stand  conv'  {10. 5.1. 2d}. 

In  9«^  t),  the  Report  accepts  an  implementation  even  if  it  provides 
only  one  of  the  alternatives  for  each  symbol  (say  either  '@'  or  at_).   The 
intention,  in  fact,  seems  to  be  to  accept  any  representation  language  that 
provides  at  least  one  way  of  expressing  each  construct  in  the  language. 
Thus  an  implementation  need  not  necessarily  have  both  stick-symbol  and 
then-symbol,  since  with  either  one  a  choice-clause  can  be  constructed 
{R9.1.1  c,h}.   Similarly,  a  times -ten -to -the -power -symbol  might  be  omitted 
since  some  letter-e- symbol  will  usually  be  available  [R8. 1.2.1  h}. 

Selection  of  representations  for  standard  prelude  operators  is  complicated 
by  the  fact  that  not  only  do  many  operators  have  a  number  of  symbols,  but 
many  symbols  are  assigned  more  than  one  function.   Some  way  must  be  found 
through  the  mazes  of  relationships  among  uses  and  alternatives  for  't  ',  '~', 
and  '['.   Likewise,  there  are  some  complications  if  other  than  the  reference 
representations  are  chosen  for  any  of  the  multiple  symbols  that  map  into 
certain  representations  (for  example,  four  symbols  map  into  ':').   Some 
decision  must  be  made  as  to  whether  the  implementation  will  accept  the 
"allowable"  alternatives  like  '..'  for  ':'„  [R^.kh] 

Occasionally,  a  representation  designer  will  be  forced  to  consider  use 
of  a  diphthong  for  some  operation.   In  this  effort,  he  must  check  the 
operator  grammar  in  9«^'2.1  of  the  Report  to  see  that  the  diphthong  is  legal 
and  thus  will  not  cause  ambiguity.  A  secondary  consideration  is  to  try  to 
leave  intact  the  possibility  of  families  of  operators.   For  example,  diphthongs 
ending  in  '/'  should  be  avoided  because  they  are  all  available  for  a  family: 

V  V  //  -/  ^/ 


Indeed,  this  is  the  family  of  APL  reduction  operators. 

Finally,  9«^«2.2  b  specifies  that  a  bold-tag  is  composed  of  marks 
corresponding  to  its  LETTER'S  and  DIGIT'S,  where  the  'mark  corresponding 
to  each  LETTER  ([or]  DIGIT)  is  similar  to  the  mark  representing  the  corresponding 
LETTER-symbol  ([or]  DIGIT  symbol)'.   Interpretation  of  the  word  'similar'  has 
led  to  a  variety  of  "stropping  techniques".   The  representation  designer  must 
choose  one  of  these  techniques.  My  own  suggestions  will  hinge  on  the 
observation  that  nothing  is  more  similar  to  an  object  than  itself,  but  see 
section  IV  for  the  gory  details. 


1.3  Hardware  Considerations. 

Two  standard  codes  for  computer  text  have  been  defined  and  widely 
used:  the  ISO  code  (2)  (and  especially  its  ASCII  subset  (3^5))  and 
EBCDIC  (^,5).   The  former  are  used  by  most  of  the  world,  and  the  latter 
is  used  by  only  one  manufacturer.   Tables  defining  these  two  codes  are 
reproduced  in  figures  1,  2,  and  J*   Note  that  ISO  has  many  national  variants 
and  the  code  provides  spaces  where  national  groups  can  place  characters 
specific  to  their  own  needs. 

An  ALGOL  68  program  will  certainly  not  be  given  the  same  binary  encoding 
in  the  two  codes,  since,  for  example,  '+'  is  '00101110'  in  EBCDIC  and  '0101011' 
in  ISO.  However,  in  an  important  sense  one  can  make  programs  in  the  two 
codes  appear  similar;  the  graphics  chosen  to  represent  each  ALGOL  68  construct 
can  be  the  same  in  both  codes. 

Examining  the  code  tables,  we  see  that  ISO  has  a  number  of  characters 
that  will  not  be  the  same  on  every  terminal,  the  so  called  "national 
characters".   These  interfere  with  a  common  graphic  representation  of 
ALGOL  68  so  they  should  be  avoided.   Leaving  them  aside,  the  following 
characters  are  available  in  both  codes: 

upper  and  lower  case  letters :  a-z  A-Z 

digits :  0-9 

space  !   "  #  $  f^  p,   '   (   )  *  +  ^   -  •  / 

:   ;  <  =  >  ?  @  _ 

The  following  control  characters  appear  in  both  codes  and  ought  to 
be  considered  in  design  of  a  representation: 
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Extended  Binary- 
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■Decimal  Interchange  Code 

(EBCDIC ) 

(Chart  adapted  from   (k).) 
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Figure  2.   ISO  7 -bit  Coded  Character  Set 
(also  known  as  ASCII) 


Some  codes  are  not  officially  assigned  graphics,  but  the 
preferred  alternatives  are  shown  in  parentheses.   The  right 
hand  side  of  columns  32,  80,  and  112  show  the  alternatives 
chosen  for  ASCII.   (ISO  chart  taken  from  (2).) 
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(Taken  from  Lindsey  (2).) 
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A  few  comments  on  individual  characters: 

'#'  is  not  ISO  standard,  but  is  the  ASCII  choice  for  position  35  in 
ISO.   The  alternative  is  '£',  a  currency  symbol  as  is  ';^'. 

•@'  is  not  ISO  standard,  hut  is  the  preferred  alternative  for  position 
6k   and  is  only  infrequently  assigned  other  graphics. 


'  I  ',  '!  ',  '-',  and  '-' 


Extended  discussion  of  these  follows. 


The  manufacturer  supporting  EBCDIC  has  helped  introduce  endless  confusion 
into  the  codes  for  these  graphics.   The  following  charts  illustralTe  this  by 
giving  the  hexadecimal  location  in  the  code  of  various  graphics. 

ISO  Alter-  IBM  IBM 

code:     ISO    natives     ASCII-8   ASCII-8   EBCDIC-8   EBCDIC 
(references)   (2)      (2)        (3,5)      (h)  (5)        (k) 


graphic 


70     u  d 
21 

5D     A  U  S 


kl 

21  kF 

7C      FC  6a 

5D        BD  5A 


kF 


8e 
6e 


BE 

6e 


6e 
Be 


5F 


Al 


5F 


Many  terminal  control  software  routines  interpret  the  "best"  meaning  of 
characters.   Thus  on  some  North  American  timesharing  systems  a  '!'  at  an 
ASCII  terminal  is  converted  to  an  EBCDIC  ' | '  by  the  time  it  reaches  the 
executing  program.   In  such  an  environment  or  even  in  more  benign 
environments  there  are  then  eight  characters  that  can  be  produced  by  the 
system  in  response  to  some  code  that  at  one  time  was  supposed  to  be  ' | ' : 
!   '   ]  ^  A  U  il  d.   Similarly,  '-%'  can  be  ~  n  -  t  and  p.   This 
peregrination  can  render  a  character  a  poor  choice  for  an  important  task 
in  a  representation  language. 

Another  "feature"  of  terminals  and  terminal  support  routines  is  the 
translation  of  lower  case  characters  to  upper  case  because  time-sharing 
supervisors  expect  upper  case  commands.   This  will  influence  the  choice  of 
letter  symbols  and  stropping  techniques  below. 
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The  final  set  of  hard-ware  problems  is  concerned  vd.th  format  effectors. 
In  particular,  should  backspace  and  carriage  return  be  permitted  to  instruct 
the  compiler  to  reconstruct  the  input  text  image  exactly  as  it  would  appear 
on  a  page  produced  by  an  ISO-code  terminal?  If  so,  bold  stropping  could  be 
accomplished  by  backspacing  and  underlining,  or  even  by  returning  the 
carriage  and  underlining.   For  a  number  of  reasons,  such  composite  characters 
are  bothersome  and  must  not  be  allowed: 

0  mechanically,  backspace  is  a  slow  operation.  For  example  an  IBM 
Selectric© backspaces  J+2^  slower  than  it  forward  spaces.  This  delay  is 
enough  to  break  typing  rhythm  and  cause  marginal  discomfort  and  confusion. 

1  many  terminals  do  not  interpret  ASCII  backspace  correctly  and  replace 
the  character  after  repositioning  the  typing  element  (or  cursor).  R.  W.  Bemer 
has  conrplained  about  this  in  a  letter  to  Datamation  (8),  but  I  do  not  expect 
his  plea  to  stem  the  tide. 

o  a  compiler  that  interprets  the  printed  image  is  not  interpreting  the 
characters  in  their  sequential  presentation.  Error  indications  may  not  be 
correctly  associated  with  the  text,  and  even  if  so  there  may  be  no  good 
clue  as  to  how  the  text  is  stored  in  a  file  and  should  be  modified.   This 
problem  would  not  be  as  bad  if  all  editing  were  done  interactively,  but  that 
is  not  always  economical. 

°  some  systems  use  only  carriage  return  as  a  signal  for  end  of  line, 
so  if  the  compiler  were  interpreting  the  image,  all  the  characters  would  be 
on  one  line.   (Certainly  the  compiler  for  such  an  installation  would  be 
more  clever,  but  it  is  a  curious  thought.) 
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II.   Five  ALGOL  68  Symbol  Set  Suggestions 

1.  plus -i -times -symbol:   '+*' 

Few  devices  provide  'J_',  the  reference  language  character  for  the 
plus -i -times -symbol.   It  can  be  constructed  from  overlaid  characters,  but 
these  are  a  questionable  recourse  at  best.   The  report  also  suggests  i_, 
but  this  suffers  stropping  separation  or,  if  reserved  words  are  used, 
conflicts  with  the  identifier  'i'.   The  Lindsey  ISO-code  representation 
proposes  '!'  and  '  |_',  but  these  encounter  the  ISO  vertical  bar  uncertainty. 

In  Algol  Bulletin  '^k   W.  Freeman  (6)  proposed  that  modulo-operation  be 
represented  by  'tX'.   This  has  been  accepted  in  the  Revised  Report,  and  a 
parallel  construction  suggests  '+X'  for  the  plus -i -times -symbol.  More 
usually,  this  will  be  written  as  '+-^';  but  note  that  this  does  not 
interfere  with  the  x-asterisk  family  of  diphthongs,  because  the  family 
members  '^•^'  and  "^*'  already  have  standard  meanings.   Note:   '+*'  would 
also  be  used  in  transput  to  represent  plus-i -times. 

Examples :  3+*5  u  +*  v 

(R  10.2.3.7J)   (re  a+re  b)  +^  (im  a+im  b) 

(R  11.1)       (x  >  0  I  rp+*ip  I  abs_  ip+*(y  >  0  |  rp  |  -rp)) 

2.  of -symbol:   '-<' 

At  Los  Angeles,  '->'  was  removed  as  a  representation  of  of -symbol 
because  it  points  the  wrong  way  -  from  son  to  parent,  andbecause  in  practice 
it  points  the  opposite  way  from  the  arrow  in  a  similar  construct  in  PL/I. 
Only  of  is  left  and  it  suffers  extreme  stropping  separation  for  an  operation 
that  binds  even  more  tightly  than  a  monadic  operator.   One  proposal  for 
of-symbol  is  '.',  but  again  this  has  the  opposite  meanings  to  the 
corresponding  construct  in  PL/I.   I  derived  an  alternative  to  _of  by 
starting  from  the  is-an-element -of-symbol,  'e',  and  then  considering  ' -< ' 
and  ^' .   The  latter  has  an  excellent  diphthong:   '-<',  so  I  propose  that 
the  Report  list  of  and  '%<  '  while  the  "approved  alternative  representation" 
be  '-<'.   Note:   It  should  cause  little  difficulty  that  '-C'  is  also  the 
symbol  for  a  photocathode. 

Examples:   father -<p 

(R10.2.3.Tc)   sqrt(re-<a**2  +  im-<a**2 ) 

(RlO.J.l.ld)   p-<a  >  p-<b  or  p-<:a  =  p-<b 

and  (l-<a  >  l-<b 

or  l-<a  =  l-<b 

and  c-<a  >  c-<b) 
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3.   stick-symbol,  again-symbol,  or-symbol 

The  Report  proposes  that  the  stick-symbol  be  represented  by  '| ',  a 
reasonable  enough  choice  given  the  name.   Three  problems  arise:   the 
problems  of  * | '  migration  in  ISO;  the  fact  that  PL/l  uses  this  character 
for  'or',  so  it  appears  in  boolean  contexts  in  that  language  as  well  as 
in  ALGOL  68;  and  the  fact  that  the  symbol  has  no  inherent  relation  to 
decision.   Despite  these  problems  a  brief-in-symbol  is  essential  to  avoid 
stropping  separation  in  if-clauses  yielding  a  value.   The  '?'  meets  all  the 
objectives  of  the  stick-symbol  (which  I  would  now  rename  the  decision-symbol): 
it  is  available  in  most  character  sets  without  ambiguity,  it  is  unused  in 
other  languages,  and  it  naturally  implies  that  a  decision  is  being  made. 

A  disadvantage  of  the  stick-symbol  is  that  it  is  also  used  for  in  and 
out  in  case-clauses,  and  '?'  would  be  less  meaningful  in  this  context. 
Consider,  however,  the  unfortunate  similarity  of  ' (a  ?  b,  c  ?  d) '  and 
'(a  ?  b;  c  ?  d) ' .   (it  is  just  as  unfortunate  with  s tick- symbols. )  It  would 
be  too  drastic  to  eliminate  the  brief  case  clauses,  but  why  not  specify  that 
CASE-brief-in  and  CASE -brief -out  be  in  and  out?  Then  the  example  would  be 
' (a  in  b,  c  out  d)',  a  reasonable  result,  even  with  stropping  separation. 
The  case -again  should  not  have  a  brief  form:  it  occurs  in  long  clauses 
that  easily  confuse  a  reader's  parse;  it  is  non-intuitive  -  an  integral-again 
seemingly  ought  to  select  a  new  range  for  the  original  enquiry  clause. 

The  proposal  for  choice  clause  tokens  can  be  diagrammed: 

start   in  again  out  finish 


IF -brief 
CASE -brief 
IF -bold 
CASE -bold 


( 

9 

? : 

9 

) 

( 

in 

xxxxx 

out 

) 

if 

then 

elif 

else 

fi 

case 

in 

ouse 

out 

esac 

Reassigning  the  brief-in  frees  the  stick-symbol  for  representation  of 
or.   For  the  time  being,  at  least,  I  will  not  propose  this  because  many 
systems  will  still  use  stick  for  decisions.   I  would,  howevever,  like  to 
contradict  a  statement  in  the  Lindsey  ISO-code  representation  paper: 
"Hopefully,  it  [the  or-symbol]  is  not  as  common  as  &  [the  and-symbol],  so 
we  shall  restrict  it  to  just  or."  In  view  of  DeMorgan's  laws,  this 
assumption  seems  dubious.   The  proposed  Revised  Report  (LAl(25l)  version  2) 
uses  k-"]   and's  and  only  27  or's,  but  1^  of  these  and's  are  in  checks  to  see 
whether  a  given  operation  can  be  performed  on  a  given  file.   Changing  these 
to  or's  is  not  only  conceivable,  but  would  enable  the  "good  programming 
practice"  of  specifying  the  shortest  alternative  first.   For  example, 
'put  char'  could  begin: 

if  not  opened  of  f  or  not  put  possible  (f) 

then  undefined 


else  . . . 
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Examples : 

(Rll.l)  (rp  =  0   ?   0   ?  y/(2   *  rp)) 

(RIO. 5 -Id.)      (pofa<pofb   ?   false 

?:   pofa>pofb    ?  true 

?:   lofa<lofb    ?   false 

?:   lofa>lofb    ?  true 

?  cofa>cofb) 

(RIO. 3. 5-2  a  string) 

(y(j)  in 

(ref  char  c) : 

(upb  s=l?c:=s?  incomp  :=  true)  , 

(ref  ()  char  cc) :  I 

(upb  s  =  upb  cc  -  Iwb  cc  ?  cc  (:)  :=  s  (:)  ?  incomp  :=  true)  , 

(ref  string  ss):  ss  :=  s 

out  incomp  :=  true)  I 

II. i^.   Disentangling  U ',  't  ',  '-',  and  '[' 

Because  few  devices  provide  these  graphics,  the  Report  allows  a  profusion 
of  alternatives.   The  down-symbol,  up-symbol,  tilde-symbol,  and  floor-symbol 
are  variously  specified  to  replaceable  by  the  bold  symbols  down,  shr,  up,  shl,   1 
skip,  n ot ,  Iwb,  and  entier;  also  tilde  may  be  synonymous  with  -i  and  up-arrow    r 
with  **.  Unfortunately,  the  synonymity  is  context  dependent.   For  example 
entier  may  replace  the  floor-symbol  in 

[  (flex  I  al  I  xl)  (l) 

but  the  same  replacement  cannot  be  made  in 

[  (flex  I  al  I  xl). 

The  intertwining  of  symbols  for  exponentiation,  shift  left;,  and  raise 
semaphore  is  even  more  complex.  We  can  diagram  the  relations  thus 
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raise  semaphore 


An  instance  of  u£  cannot  readily  be  associated  with  the  correct  operation  by 
either  a  reader  or  a  compiler.   Moreover,  the  hapless  programmer  is  given 
little  guidance  as  to  how  he  might  be  able  to  write  his  program  to  avoid 
possible  reader  confusion. 

These  synonymities  were  introduced  to  permit  an  ALGOL  68  program  to  be 
written  on  any  device,  but  only  one  representation  is  required  at  any  one  time. 
It  is  not  difficult  to  assign  symbols  to  operations  so  no  symbol  represents 
more  than  one  operation.   In  addition  to  reducing  reader  ambiguity,  this 
assignment  simplifies  construction  of  encoders  for  transportation  representations, 
If  the  problems  are  resolved  as  follows,  programs  can  still  be  represented  on 
any  device,  but  the  synonymity  conflicts  are  removed.  Where  possible, 
non-stropped  representations  are  also  proposed. 


boolean  negation 
skip 

lower  bound 
entier 


not 


skip 

Iwb 
entier 


as  in  the  Report 

no  ~,  the  standard  prelude  does 
not  use  the  tilde  and  has  no 
problem 

no  '[• 


no 


neither  operation  has  a  stronger  claim  to  [  than  the 
other,  and  it  would  be  silly  to  allow  entier  to  replace 
Iwb;  this  restriction  will  not  make  programs  appreciably 
longer 


lower  semaphore 
raise  semaphore 
shift  left 


down 

up 

shl 


as  in  the  Report 
as  in  the  Report 


shift  right 


shr 


it  is  not  intuitively  clear  why  'up'  should  mean  'left' 
and  not  'right'  ;  I  considered  suggesting  '<*'  and  '*>', 
but  their  meanings  could  be  forgotten  since  shifts  are 
infrequent 
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exponentiation  t  "^^  pow        *^   will  seldom  be  unavailable; 

t  is  kept  because  it  looks 
right;  pow  will  do  if  the 
others  are  imavailable. 

Examples: 

(in  an  assembler)     op  shl  8  or  adrs 

(tautology  test)  (not  (2  pow  k  /=  2rl  shl  k)  |  skip  |  undefined) 

(binary  search) 
proc  bis ear ch  =  (()  int  a,  int  arg,  default)  int: 

(int  val  :=   default;  bits  i  :-   2rl  shl  (bits  width  -  l); 

while  abs  i  >  upb  a  -  Iwb  a  +  1  do  i  :=  i  shr  1  od; 

int  k  :=  abs  i  +  Iwb  a  -  1; 

while  abs  i  /=  0  do  i  :=  i  shr  1; 

(a(k)  <  arg  |  k  :=  min  (k+abs  i,  upb  a) 

I :  a(k)  =  arg  |  val  :=  k;   i  :=  bin  0  |  k  -:=  abs  i) 

od;   val) 

II  5.   'flip':  true,  'flop':  false 

For  bool  and  bits,  ALGOL  68  violates  the  principle  that  valid  denotations 
should  be  valid  transput  values.  As  there  seems  to  be  no  good  reason  for  this, 
I  propose  the  elimination  of  'flip'  and  'flop'.   The  value  'put'  for  true  will 
be  'true';  that  for  false,  'false' .   'Get'  for  a  bool  will  read  a  tag  and  set 
the  bool  to  true  if  the  tag  starts  with  't',  false  for  'f,  and  call  'char 
error  mended'  if  the  tag  starts  with  neither.   The  transput  value  for  a  bits 
will  be  a  BITS -denotation. 
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III.   The  Design  of  the  Hardware  Eepresentation 

This  section  covers  the  points  raised  in  section  k   of  Lindsey's  paper. 
The  Representation  is  detailed  in  Appendix  A_,  and  section  III.  10  below 
contains  some  notes  to  explain  that  Appendix. 


III.l  Letter  tokens 

The  central  question  here  is  whether  to  allow  both  upper  and  lower  case 
letters  in  tags  and  the  letters  assigned  denotation  functions  (a,b,  c,  d, e, f^ 
g,i,k,  1,  n,  p,  q,  r,  s^x,  y,  z  for  radix,  times  ten  to  the  power,  and  format 
markers).   In  fact  few  programmers  will  want  to  have  tags  differing  only 
in  the  case  of  one  or  more  letters.   (is  there  really  enough  difference 
between  'ox'  and  'OX'?  Is  'scan'  not  'Scan'?)  Moreover,  many  line  printers 
are  normally  operated  with  only  upper  case.   Therefore,  the  basic  answer  to 
the  central  question  is  that  internally  the  compiler  will  have  only  one 
case  of  letter  symbols  (which  one  is  immaterial) .   For  string  denotations 
and  transput  of  strings,  though,  case  will  be  distinguished.  We  now  consider 
three  categories  of  terminals  and  three  contexts  for  tags  and  denotation 
function  letters: 


UC/liC  terminal, 
UC  means  bold 


tag  and 
denotation 
function  letters 


denotation  function  letters 
(a,b,c,  d,  e,  f,  r,  t) 


input 
UC/lC* 


output 
LC 


UC/LC  terminal, 
case  not  used 
for  stropping 


UC/LC^ 


UC/LC^ 


LC 


UC  terminal 


UC* 


UC* 


LC** 


*Converted  to  one  case  internally. 
**System  software  or  terminal  converts  to  UC. 

Appendix  A  lists  lower  case  letters  for  ISO  and  upper  case  letters  for  EBCDIC. 
This  is  only  because  these  characters  sets  are  normally  available  on  the 
corresponding  equipment. 


Of  the  national  variant  characters  in  the  last  foirr  ISO  columns,  only 
'@'  and  '~'  are  assigned  a  function  in  this  hardware  representation.   The 
other  eight  are  available  for  use  as  additional  letter-ABC -symbols.  When 
they  are  not  so  used  they  can  serve,  as  suggested  below,  as  style -TALLY -monads 
and  other-string-items. 
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111. 2.  Bold  tags. 

This  hardware  representation  is  designed  to  be  applicable  to  any 
stropping  convention.   To  facilitate  this,  only  one  case  is  assvimed  and  no 
function  is  assigned  to  ' ' '  and  '_'.   Stropping  is  discussed  in  section  IV. 

111. 3.  Composite  characters. 
Outlawed  as  discussed  in  section  I.  3' 

Ill.i^-.   Carriage  Return,  Line  Feed,  and  Delete. 

As  Lindsey's  paper  suggests,  CR,  LF,  CR/LF,  LF/CR  (my  addition),  form 
feed,  and  vertical  tab  all  should  terminate  a  line  of  input  to  the  compiler. 
Delete,  backspace,  and  all  other  control  characters  should  be  ignored. 

111.5.  Notes  on  Particular  Representations 

The  controversial  representations  in  this  design  have  been  discussed 
in  section  II.  A  few  further  choices  deserve  comment  here. 

'e'  for  times -ten-to-the-power-symbol 

The  Report  implies  that  both  'e'  and  some  other  graphic  ('  „  '  or  '\') 
will  be  available  for  this  symbol.   It  does  not  seem  valuable  to  have  two 
or  three  graphics  mean  the  same  thing,  especially  where  one  is  widely 
available  and  the  others  are  not. 

(space)  for  space -symbol 

The  Report  proposes  that  both  the  space  and  visible  space  ('^')  be 
available.   The  latter  cannot  be  allowed  because  it  would  have  to  be 
composite  on  available  equipment.   Using  underlines  for  visible  spaces  does 
not  help  because  they  run  together  and  are  no  easier  to  count  than  spaces. 

'& '  for  and-symbol 

This  is  reasonable  and  follows  Lindsey's  proposal. 
Iwb  and  entier  for  floor-symbol 

These  two  bold  tags  are  not  interchangeable  alternatives. 

'-)'  and  '~'  for  not-symbol 

Section  I  has  described  the  ridiculous  confusion  as  to  the  location  of 
these  symbols,  including  the  possibilility  that  in  some  circumstances 
not-symbol  would  be  typed  as  't  ' .   One  solution  would  be,  as  with  stick-symbol, 
to  suggest  an  alternate  representation  or  to  abolish  the  graphics  in  favor  of 
not.  The  situation  is  not  as  serious  with  not-symbol,  however,  because 
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not-synibol  is  only  an  operator  and  its  interpretation  is  not  crucial  to 
interpretation  of  the  structure  of  the  program  text.   Hence,  't'  and  '~' 
are  both  allowed,  depending  on  which  character  code  is  used.   Note  that,  to 
avoid  even  further  confusion,  '~'  is  not  a  representation  of  the  tilde-symbol 
and  cannot  be  used  in  TAO's. 

' ? '  for  ' error  char ' 

The  report  makes  no  proposal  for  this  value.   'Error  char'  is  used  in 
transput  to  be  the  string  value  for  a  value  that  cannot  be  translated  as 
specified.   Is  it  not  reasonable  that  this  situation  should  be  signalled 
with  the  ? 

'  '  for  'blank' 

This  assignment  corresponds  to  the  decision  to  use  (space)  for 
space -symbol. 

'{'  ...  '}'  for  ISO  brief -comment -symbol 

The  desirability  of  this  representation  is  shown  by  PASCAL  programs. 
Some  brief-comment-symbol  is  necessary  because  'vf  may  not  be  available  on 
all  ISO  terminals.   There  is  no  reason  the  compiler  cannot  insist  that  the 
left-brace  start  a  comment  and  the  right-brace  end  it. 

111. 6  Other-string -it ems,  other -pragment-i terns. 

The  characters  available  for  these  will  certainly  vary  depending  on  the 
code  in  use;  I  am  not  trying  to  define  an  implementation  independent  language. 
Essentially  every  character  that  can  be  typed  should  be  allowed  in  these 
positions,  except  for  the  format  effector  characters,  which  only  control  the 
listing  of  the  program.   (Though  a  format  effector  could  be  a  string  value 
by  means  of  transput  or  repr. )  Except  for  the  quote -image -symbol  every 
character  in  a  string  will  represent  itself;  there  will  be  no  composite 
characters  or  representations  of  one  character  by  several. 

111. 7  Abs,  repr,  and  conversion. 

Abs  and  repr  should  certainly  be  designed  so  the  position  of  a  character 
in  the  code  is  its  abs  value.   In  ISO,  abs  "A"  =  65  and  repr  65  =  "A";  in  EBCDIC 
abs  "A"  =  193  and  repr  193  =  "A".  Moreover,  in  both  codes  the  null  character 
will  be  repr  0.  As  far  as  possible,  programs  should  be  written  so  as  to  be 
independent  of  the  actual  values  associated  with  abs  and  repr.  As  the 
encoder  in  my  transportation  representation  paper  shows,  this  can  be  achieved 
with  minimal  effort  except  for  control  codes.  Appendix  A  proposes  predefined 
identifiers  for  the  six  format  affector  control  codes. 

Lindsey's  paper  proposes  two  conversion  regimes.   'Stand  conv'  transmits 
printable  characters  unchanged  but  ignores  most  control  codes.   The  format 
codes  would  create  the  requisite  spaces  to  position  the  text  as  it  would 
appear  on  paper.   Backspace  would  cause  a  call  of  'char  error'  of  the  file. 
His  other  conversion  was  'complete  conv'  which  transmitted  all  characters 
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unchanged.   Ignoring  these,  Lindsey's  paper  contains  a  program  that  uses 
what  is  called  'special  conv'.   In  fact  this  latter  is  an  important  conversion 
that  should  be  available  for  system  programmers  and  anyone  else  -who   vra.nts  to 
know  exactly  how  the  user  encoded  spaces  (and  wants  to  avoid  the  processing 
time  required  to  skip  over  them).   I  would  call  this  conversion  'layout 
encoded  conv':  printable  characters  and  format  effectors  are  transmitted 
intact,  but  other  control  codes  are  ignored.  When  'get'ing  characters,  each 
code  is  delivered  in  turn.  When  'get'ing  strings,  a  line  is  terminated  by 
any  format  effector  except  horizontal  tab;  the  tenninating  character  is 
delivered  as  the  last  character  of  the  line.   If  the  line  is  terminated  with 
CR/LF  or  LF/CR,  both  will  appear  at  the  end  of  the  line.   (This  will  depend 
on  whether  the  operating  system  delivers  both  codes  to  the  ALGOL  68  monitor. ) 

III. 8.  Use  of  remaining  characters. 

Wo  virtue  is  attained  by  associating  a  standard  meaning  with  every 
available  character.   Indeed,  a  few  ought  to  remain  available  as  style-TALLY- 
monads.   The  characters  not  yet  assigned  are 

(ISO)  [  ]  !  I  -  \ 

(EBCDIC)  I  !  jzf 

Of  these  the  brackets  should  not  be  monads  because  they  would  normally  be 
used  in  pairs  as  delimiters,  but  the  others  can  be  monads. 


It  is  unfortunate  that  the  report  makes  no  provision  for  style -TALLY -nomads 
because  the  given  set  {=,<,>,*, \ /)   is  limited  and  many  of  its  combinations  are 
in  use.   Indeed,  of  the  characters  above,  it  would  not  be  unreasonable  to 


specify  that  ' !  ', 


and  '•^'  could  not  be  monadic,  but  could  be  nonads. 


With  these  new  nomads  there  could  be  dyadic  diphthongs  like  '*! ', 


and  even 


Appendix  A,  however  proposes  no  new  nomads. 
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III. 9.   Representations  with  smaller  character  sets. 

When  only  a  subset  of  the  characters  assumed  above  is  available,  bold 
tags  should  be  used  to  make  up  for  missing  characters. 


if  this  graphic 
is  missing 

I0 


@ 

< 
> 

? 


these  bold  tags 

can  replace  its  uses 


over 


mod 


not 

at 

CO 

It  le  of 

si  ge 

and 

then  else 

elif 
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If  worst  comes  to  worst,  colon  and  semicolon  will  be  missing  and  must  be 
replaced  with  iigly  diphthongs  as  suggested  in  R9.^b.   Compilers  should 
implement  these  only  if  numerous  devices  at  the  installation  lack  the 
specified  graphic. 

the  graphic  is  replaced  by 


+= :  plus to 

:= :  is 

/= :  isnt 

(the  second  and  third  of  these  are  no  longer  mentioned  in  the  Report,  but 
still  do  not  lead  to  ambiguity.   It  can  be  remembered  that  semicolon  is  '., ' 
and  not  ',.'  because  real-denotations  can  start  with  '.'  but  cannot  end  with 
it,  so  that  no  ambiguity  exists  in  '3.,5'  for  '3;5'«) 

In  cases  where  standard  graphics  are  absent  but  other  symbols  are  available, 
the  temptation  to  use  the  latter  should  be  resisted  with  passion. 

The  minimum  character  set  for  ALGOL  68  is  thus: 

letters,  digits,  =,  +,  -,  *,  ,,  .,  (,  ),  $,  /,  ",  and  possibly 
some  stropping  character. 

III. 10.  Guide  to  reading  Appendix  A. 

Some  symbols  in  Appendix  A  are  marked  boldface  by  underlining  them. 
They  are  to  be  keywords,  reserved  words,  or  stropped  words,  according  to 
whatever  convention  is  adopted. 

The  first  listed  "representation"  for  a  symbol  should  be  used  if  possible. 
If  no  representation  is  possible  (and  the  symbol  is  not  intended  as  a  MONAD), 
the  first  possible  graphic  listed  in  the  "alternatives"  column  should  be 
chosen.  Up-syrabol  and  down-symbol  have  neither  representation  nor  alternative 
and  cannot  be  written  in  a  program  {see  section  11.^}. 

Please  note  that  "typographic  display  features"  (spaces,  new  lines,  and 
new  pages)  cannot  divide  the  characters  of  a  diphthong  or  bold  tag. 

Numerals  in  braces  refer  to  sections  of  this  paper  where  decisions  are 
discussed.  Where  no  comment  is  made,  the  representation  follows  the  suggestions 
of  the  Revised  Report. 
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rv.   stropping  Recommendations 

ALGOL  60  invented  the  concept  of  indivisible  symbols  represented  by 
boldface  identifiers.  ALGOL  68  extended  this  concept  to  allow  the  user  to 
invent  his  own  bold  tags.   In  the  Revised  Report,  it  is  made  clear  that 
these  symbols  are  no  longer  indivisible,  but  are  composed  of  ordinary 
letters  possibly  set  off  in  some  way  to  indicate  their  type  face.   The 
Report  does  not  suggest  how  they  are  to  be  set  off,  but  gives  five  examples 
IR9.I1-. 2. 2b}  including  one  using  script  letters,  a  feature  found  on  few 
existing  computer  transput  devices. 

Implementations  have  adopted  diverse  stropping  conventions;  some  have 
even  attempted  to  implement  several  conventions  with  provision  for  pragmatic 
selection  of  one  at  compile  time.   Such  attempts  at  universality  seem  doomed 
to  failure,  however.   Without  much  effort  I  was  able  to  write  a  list  of 
twenty  different  stropping  conventions,  most  incompatible  with  two-thirds 
of  the  rest.   Instead,  the  compiler  writer  should  implement  one  convention. 
Moreover,  each  program  will  at  one  time  or  another  be  transput  on  every 
device  in  the  installation,  so  the  stropping  convention  chosen  must  be 
applicable  to  every  device. 

Traditionally,  ALGOL  60  programs  have  been  stropped  with  apostrophes 
at  the  beginning  and  end  of  each  boldface  symbol.   To  me  this  convention 
seems  unduly  cluttering.   I  have  always  been  struck  by  the  clean  aesthetics 
of  ALGOL:   short  symbols,  indentation  emphasized  as  a  tool,  no  semi -colon 
before  else,  ALGOL  68 's  brief  comment  forms,  the  top-down  structure  of 
program  texts.   In  this  atmosphere,  apostrophes  seem  as  appropriate  as  neon 
lights  in  a  Japanese  garden.  Examine 

'begin'  'real'  x;  'char'  c; 

get  (f,  (c,x)); 

'if  c  =  "i"  'then'  x+1  'else'  x  'fi'  'end' 

Note  that  in  the  natural  (Western)  way  of  looking  top-to-bottom  and 
left -to -right,  the  first  item  the  eye  encounters  is  not  a  piece  of  information, 
but  is  the  interspersive  apostrophe.   Indeed,  the  most  important  distinction 
between  pairs  of  words  is  their  first  letters,  but  the  first  'letter'  of 
all  apostrophied  words  is  the  same.  Note  also  that  a  hurried  programmer  who 
used  other  languages  might  waste  time  over  the  confusion  between  'if  and 
"i";  not  much  time,  but  perhaps  enough  to  lose  his  train  of  thought. 
Moreover,  there  is  no  one  apostrophe  stropping  method.   Lindsey  lists  three, 
and  my  list  included  five,  only  one  of  which  has  been  made  illegal  by  the 
Revised  Report. 

Another  stropping  technique  that  has  been  proposed  is  underlining;  it 
does  not  clutter  and  has  a  rather  pleasant  appearance.   Its  difficulties  are 
not  aesthetic,  but  operational:   it  takes  longer  to  enter  and  revise  a 
prograjn  with  underline  stropping,  many  text  editors  and  line  printer  routines 
do  not  support  underlining.   Mechanical  backspacing  is  a  slow  operation. 
Finally,  underlining  constructs  composite  characters  and  suffers  the 
disadvantages  of  listings  that  are  not  in  one-one  correspondence  with  the 
file. 
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Apostrophes  clutter.   Underlining  has  operational  problems. 
I  reject  them  both  for  stropping.   (Blithely  ignoring  the  derivation  of 
the  word  'stropping'.)  What  do  I  propose  instead?  Reserved  words.  However, 
many  implementers  will  feel  they  must  provide  some  stropping  convention  so 
I  discuss  below  four  conventions  in  order  of  increasing  preference. 

IV. 1.  Postfix  underline  is  bold  (least  favored,  but  better  than 
sixteen  other  schemes). 

The  least  intrusive  stropping  scheme  is  to  append  an  underline  to 
the  end  of  a  bold  word.   The  reader  can  easily  ignore  the  character,  but  it 
is  there  to  resolve  ambiguity  if  he  needs  it.  A  trained  ALGOL  reader  scans 
the  indentation  before  examining  the  text.  With  prefix  stropping,  most 
lines  begin  with  the  low-information-content  strop  character;  with  postfix 
stropping,  the  strop  character  is  out  of  the  way. 

Our  earlier  random  program  would  look  like  this: 

begin_real_x;   char_c; 

get  (f,  (x,  c)); 

if_  c="i"  then_  x+1  else_  x  fi_  end_ 

Note  that  '_'  alone  is  enough  and  no  space  is  needed.   One  of  the  advantages 
of  this  notation  to  the  programmer- cum- keypuncher  is  that  the  underline  is 
typed  in  the  same  sequence  that  it  would  be  drawn:   last  after  the  rest  of 
the  word.   Thus  written  work  could  still  underline  bold  words  and  the 
transliteration  while  keypunching  is  not  onerous. 

In  a  small  way,  this  convention  is  compatible  with  complete  underlining 
of  bold  words.   If  backspaces  are  ignored  and  multiple  underlines  converted 

to  a  single  underline,  the  compiler  can  accept  "begin  3PPP3  "  (where 

'P'  means  backspace)  as  equivalent  to  "begin_".   But  the  user  could  not 
type  "bP_eP_gP_iP_nP_"  and  achieve  the  same  effect.   (Alternatively,  the 
latter  could  mean  begin  and  a  non-letter  be  required  to  end  a  bold  tag. 
This  introduces  undue  stropping  separation  in,  for  example,  "re  of_  z".) 

Postfix  underline  stropping  is  poor  for  terminals  with  a  non-escaping 
underline  key.   They  would  have  to  type  "begin_  "  which  would  be  printed 
as  "begin_".   Presumably  implementors  for  such  terminals  would  choose  some 
convention  later  in  this  section. 


IV. 2.  Upper  case  is  bold. 

One  convention  gaining  wide  use  in  Europe  is  case  stropping:  bold  tag: 
in  upper  case  and  tags  in  lower  case. 

BEGIN  REAL  x;   CHAR  c; 

get  (f,  (c,  x)); 

IF  c="i"  THEN  x+1  ELSE  x  FI  END 
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If  the  tags  are  chosen  to  be  real  words,  they  look  more  like  normal  text  if 
they  are  lower  case.   (Theoretically,  no  space  woiild  be  required  between 
"REAL"  and  "x",  but  it  might  be  considered  gauche  to  omit  it.   Consider 
"reOFz".) 

Case  stropping  is  excellent,  except  that  existing  devices,  especially 
line  printers,  often  support  only  upper  case.   One  could  postulate  combining 
this  technique  with  underlining  or  apostrophes  to  be  used  on  an  upper  case 
only  device.   Unfortunately,  this  fails  because  in  systems  with  only  one 
case,  it  is  usually  upper  case;  plain  tags  would  be  upper  case  and  thus 
mistaken  for  bold. 


17.3.   Lower  case  or  postfix  underline  is  bold. 

A  reversal  of  the  previous  convention  allows  combination  of  both  the 
first  two  conventions.  Bold  words  would  be  lower  case  (as  they  are  in  the 
Report)  or  would  be  followed  by  an  underline: 

BEGIN_real  X;  char  C; 

GET  (F,  (X,C)); 

if  C="i"  THEN_  X+1  else  X  fi  end 

(The  mixed  case  if-THEN_-else-fi  looks  unpleasant  and  is.   I  postulate  that 
in  this  particular  case  it  was  forced  on  the  programmer  because  he  was  fixing 
the  "THEN_"  at  an  upper  case  terminal.)   Presumably,  the  user  would  use  case 
distinction  at  a  mixed  case  terminal  and  fall  back  on  postfix  underline  at 
an  upper -case -only  terminal.  An  upper  case  line  printer  would  print  all  as 
upper  case,  and  the  user  would  have  to  rely  on  context  to  distinguish  bold 
from  roman.   The  latter  is  never  an  arduous  chore  in  a  well  written  program. 

The  only  disadvantage  with  this  third  stropping  convention  is  that  the 
second,  contradictory,  convention  has  achieved  considerable  useage, 

IV. k.     Reserved  words  are  bold  (most  preferred  technique). 

Few  languages  other  than  ALGOL  have  stropping  conventions,  most  rely 
on  reserved  words  or  a  language  design  that  makes  all  distinctions 
determinable  from  immediate  context.  Many  successful  ALGOL  compilers  have 
avoided  stropping.   Several  ALGOL  68  implementations  (at  least  Vancouver, 
Dartmouth,  Illinois  Institute  of  Technology,  and  probably  many  more)  have 
worked  out  techniques  to  minimize  or  eliminate  stropping.   Here  is  a  random 
program  without  stropping: 

begin  real  x;  char  c; 

get  (f,  (x,  c)); 

if  c  =  "i"  then  x+1  else  x  fi  end 
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Is  there  any  reason  to  believe  that  the  human  reader  should  have  difficulty 
distinguishing  bold  from  roman  in  this  program? 

One  interesting  point  is  the  fact  that  the  Eevised  Report  lists  20^ 
more  predefined  reman  words  than  bold  tags.   To  be  sure,  the  roman  words 
can  all  be  redefined  with  far  less  penalty  than  a  redefinition  of  if;  but 
still  they  constitute  a  large  class  of  identifiers  the  user  must  remember. 

In  the  past,  automatically  generated  parsers  have  behaved  poorly  when 
confronted  with  a  reserved  word  used  as  an  identifier.   There  is  no  inherent 
need  for  this  to  be  the  case,  and  work  such  as  that  reported  by  Graham  and 
Rhodes  (7)  is  showing  how  to  avoid  these  problems. 

What  is  to  be  done  about  identifiers  that  contain  a  reserved  word  as 
an  integral  part,  for  example  "year  to  date"?  It  is  possible  to  forbid 
this,  but  that  leaves  too  many  opportunities  for  the  programmer  to  forget 
that  something  is  a  reserved  word.  After  all,  the  entire  identifier  is 
far  from  any  reserved  word.   Instead,  I  suggest  that  underlines  replace 
blanks  in  identifiers.  Where  an  identifier  is  continued  from  one  line  to 
the  next,  it  may  have  embedded  blanks,  but  it  must  end  with  an  underline  on 
the  first  line  or  begin  with  an  underline  on  the  second  line.   Either 

year_to_ 

date 

or 

year_to 

_date 

will  be  acceptable  renderings  of  "year_to_date " . 

In  ALGOL  68  a  user  can  declare  arbitrary  tags  to  be  bold  face.   There 
are  three  reasonable  techniques  for  dealing  with  these.   (l)  Such  tags  would 
be  iinstropped  and  treated  as  bold  only  in  the  block  where  they  were  so 
declared.   This  technique  implies  that  the  token  scanner  will  analyze  the 
block  structure,  a  good  idea  because  it  can  aid  automatic  correction  of 
bracket  errors.   (2)  They  could  be  unstropped,  but  treated  as  bold  throughout 
the  compilation.  At  least  one  implementation  (IIT)  has  chosen  this  route. 
(3)  They  could  be  stropped  in  some  way.   Underlines  serve  as  spaces  in  roman 
tags,  so  they  cannot  be  used;  apostrophes  at  both  ends  are  unnecessary; 
postfix  apostrophes  are  almost  as  unobtrusive  as  postfix  -underlines;  so  I 
propose  that  a  postfix  apostrophe  be  used. 

Much  as  I  dislike  stropping  of  standard  bold  tags,  I  recommend  the  third 
method  of  distinguishing  user  created  bold  tags.   They  will  certainly  be 
less  used  than  syntactic  words  like  if  and  real,  and  usually  they  will  not 
appear  more  than  once  in  a  phrase.  More  critically,  they  will  be  unfamiliar 
to  a  reader  of  the  program,  so  he  deserves  to  have  them  set  off  in  some 
manner.  Here  is  a  part  of  a  program: 
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mode  token'  =  struct  (int  type,  string  val,  rtok'next), 

rtok'  =  ref  token'; 
rtok'toklist  :=  nil,  eop  :=  heap  token'; 
ref  rtok'tokput  :=  toklist; 
sema  tokens_ready  =  level  0, 

may_move_tokens  =  level  1; 

Note  that  in  its  use  as  a  mode  or  a  monadic  operator,  a  user  defined  bold 
tag  is  separated  from  its  object  by  the  apostrophe.   It  can  be  viewed  as  a 
notation  that  the  thing  to  its  left  operates  on  the  thing  to  its  right. 
Moreover,  a  postfix  apostrophe  is  somewhat  like  normal  usage  where  an 
apostrophe  may  appear  near  the  end  of  a  word. 

Some  emergency  method  of  stropping  may  be  necessary  anyway  to  handle 
cases  where  a  tag  is  declared  both  bold  and  roman.   In  such  cases,  it  would 
be  assumed  roman  unless  stropped  or  unless  the  syntactic  position  demanded 
that  it  be  bold.   This  mechanism  is  one  way  to  solve  the  problem  of  're' 
and  re_  and  'im'  and  im. 

(Writing  this,  I  have  come  to  worry  about  "re"  and  friends.  Why  do 
both  operator  and  field  selector  exist?  Synonymously,  is  compl  a  primitive 
mode  or  is  it  a  struct?  If  it  is  a  mode  then  we  need  only  operators  to 
process  it  and  an  operator  to  construct  objects  of  that  modes  (which  we 
have:   '!')•  Viewed  thus,  it  makes  no  sense  to  assign  a  value  to  part  of 
the  primitive  object  "z".   On  the  other  hand,  given  the  stropping  separation 
of  re  and  im  and  given  "-<"  for  of^  it  is  just  as  reasonable  to  eliminate 
re  and  im  from  the  language.) 

Final  thought:   I  believe  an  efficient  compiler  can  be  written  so  that 
no  more  than  29  words  are  reserved: 

BEGIN,  BY,  CASE,  CO,  COMMENT,  D0,  ELIF,  ELSE,  END,  ESAC,  FI, 
FLEX,  FOR,  FROM,  IF,  IN,  MODE,  OD,  OP,  OUSE,  OUT,  PR,  PRAGMAT, 
roiO,  PROC,  REF,  STRUCT,  THEN,  TO,  UNION,  WHILE. 

With  these  words,  the  structure  of  the  text  can  be  determined.   The  other 
bold  words  in  the  Report  could  be  redeclared,  but  could  not  be  used  in  their 
bold  sense  within  that  block. 

I  offer  this  limited-reserved-words  approach  as  a  challenge  to  parser 
implementers. 
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Appendix  A.   Proposed  Hardware  Representation 

Where  there  is  a  difference  between  the  ISO  and  EBCDIC  representations, 
the  respective  parts  are  labeled  (iSO)  and  (EBCDIC). 


8.1.4  Character  denotations 

d)   other  string  item  :  style  i  monad  {942b}; 


{111.6} 


(iSO)  ~  [  ]  {  ]  (and  upper  case  letters) 

(EBCDIC)  -I  (non-printing  codes  above  'space';  lower  case  letters). 


9.1.1  Syntax 


(111.3} 


IF  ::  choice  using  boolean. 
CHOICE  brief  start  :  open  token. 
IF  bold  start  :  if  token. 
CASE  bold  start  :  case  token. 
IF  brief  in  :  decision  token. 
IF  bold  in  :  then  token. 
CASE  STYLE  in  :  in  token. 
IF  brief  again  :  again  token. 
IF  bold  again  :  else  if  token. 
CASE  bold  again  :  out  case  token. 
IF  brief  out  :  decision  token. 
IF  bold  out  :  else  token. 
CASE  STYLE  out  :  out  token. 
CHOICE  brief  finish  :  close  token. 
IF  bold  finish  :  fi  token. 
CASE  bold  finish  :  esac  token. 


{there  is  no  'CASE  brief  again'.   The  provisions  above  can  be  diagrammed: 

start   in   again  out  finish 
IF  brief 
CASE  brief 
IF  bold 
CASE  bold 


( 

9 

?: 

9 

) 

( 

in 

X 

out 

) 

if 

then 

elif 

else 

fi 

case 

in 

ouse 

out 

esac 

out  esac   ] 


9.2  Comments  and  pragmats 

d)   STYLE  other  FRAGMENT  item  :  quote  symbol; 

other  string  item  (except  not  STYLE-PRAGMENT-symbol). 


{111.6} 
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9.'4-.2  other  TAX  symbols 

a)   style  i  letter  ABC  :  [if  case  is  not  used  for  stropping]        tlH-l} 
(ISO)  upper  case  of  letter  ABC  symbol 
(EBCDIC)  lower  case  of  letter  ABC  synibol 

(internally  all  STYLE-letter-ABC ' s  are  converted  to  one  case}. 


b)   style  i  monad  :  (ISO)  !  \  |  '^ 

(EBCDIC)  \    \    i 
9.^.2.2  Representation 


[111.8] 


{IV] 


b)  Stropping.   {This  representation  is  compatible  with  all  proposed 
stropping  conventions  and  favors  none  (pun  intended).] 

9.4.1  Representations  of  symbols 


a)   Letter  symbols 

(III.l] 

symbol 

(ISO) 

(EBCDIC ) 

symbol 

(ISO) 

(EBCDIC ) 

letter  a 

symbol 

a 

A 

letter 

n 

symbol 

n 

N 

letter  b 

symbol 

b 

B 

letter 

0 

symbol 

0 

0 

letter  c 

symbol 

c 

C 

letter 

P 

symbol 

P 

P 

letter  d 

symbol 

d 

D 

letter 

q 

symbol 

q 

Q 

letter  e 

symbol 

e 

E 

letter 

r 

symbol 

r 

R 

letter  f 

symbol 

f 

F 

letter 

s 

symbol 

s 

S 

letter  g 

symbol 

g 

G 

letter 

t 

symbol 

t 

.  T 

letter  h 

symbol 

h 

H 

letter 

u 

symbol 

u 

U 

letter  i 

symbol 

i 

I 

letter 

V 

symbol 

V 

V 

letter  j 

symbol 

J 

J 

letter 

w 

symbol 

w 

W 

letter  k 

symbol 

k 

K 

letter 

X 

symbol 

X 

X 

letter  1 

symbol 

1 

L 

letter 

y 

symbol 

y 

Y 

letter  m 

symbol 

m 

M 

letter 

z 

symbol 

z 

Z 

b)  Denotation  symbols 
symbol 


representation 


digit  zero  symbol 

0 

digit  one  symbol 

1 

digit  two  symbol 

2 

digit  three  symbol 

5 

digit  four  symbol 

k 

digit  five  symbol 

5 

digit  six  symbol 

6 

digit  seven  symbol 

7 

digit  eight  symbol 

8 

digit  nine  symbol 

9 

point  symbol 

• 

times  ten  to  the  power 

symbol 

(ISO)  e 

(EBCDIC)  E 

(III. 5] 
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symbol 

true  symbol 

false  symbol 

quote  symbol 

quote  image  symbol 

space  sjrmbol 

comma  symbol 

empty  symbol 

c )   Operator  symbols 

symbol 

or  symbol 
and  symbol 
ampersand  symbol 
differs  from  symbol 
is  less  than  symbol 
is  at  most  symbol 
is  at  least  symbol 
is  greater  than  symbol 
divided  by  symbol 
over  symbol 
percent  symbol 
window  symbol 
floor  symbol 
ceiling  symbol 
plus  i  times  symbol 
not  symbol 
tilde  sj.Tribol 
down  symbol 
up  symbol 
plus  symbol 
minus  symbol 
equals  symbol 
times  symbol 
asterisk  symbol 
assigns  to  symbol 
becomes  symbol 


representation 

true 
false 


(space) 


{111.5} 


empty 


representation   alternates 


& 
< 


> 

/ 


(EBCDIC)  "• 


or 

& 

and 

(III. 5 

( 

/= 

It 

ne 

<= 
>= 

le 
St 

i 

over 

elem 

Iwb, 
upb 

entier 

{II. ^ 
{11.^} 
{11.1} 

III. 5] 

(ISO)  ~  not  [Tl.h,    III. 5] 
{II. i^} 


> 


e£ 

■X- 


(Although  they  are  not  listed  in  S.k,    the  following  are  defined  in  chapter  10.} 


exponentiation  operator 
modulo  operator 
plus  and  becomes  operator 
minus  and  becomes  operator 
times  and  becomes  operator 
divided  by  and  becomes  operator 
over  and  becomes  operator 
modulo  and  becomes  operator 
plus  to  operator 
shift  left  operator 
shift  rif^ht  operator 
raise  semaphore  operator 
lower  semaphore  operator 


+ 


■X- 

9 


pow     t 
mod 
plusab 
minusab 
time sab 
divab 
overab 
mo  dab 


plus  to 

shl 

shl 

"^own 


(11.^} 


; 


(11.^} 
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d)  Declaration  symbols 
As  in  the  Revised  Report 

e)  Mode  standards 

As  in  the  Revised  Report 

f)  Syntactic  symbols 

symbol 

bold  begin  symbol 
bold  end  symbol 
brief  begin  symbol 
brief  end  symbol 
and  also  symbol 
goon  symbol 
completion  symbol 
label  symbol 
parallel  symbol 
open  symbol 
close  symbol 
decision  symbol 
again  symbol 
if  symbol 
then  symbol 
else  if  symbol 
else  symbol 
fi  symbol 
case  symbol 
in  symbol 
out  case  symbol 
out  symbol 
esac  symbol 
colon  symbol 
brief  sub  symbol 
brief  bus  symbol 
style  i  sub  symbol 
style  i  bus  symbol 
up  to  symbol 
at  symbol 
is  symbol 
is  not  symbol 
nil  symbol 
of  symbol 
routine  symbol 
go  to  symbol 
go  symbol 
skip  symbol 
formatter  symbol 


representation 

begin 
end 


) 


par 


? : 

if 

then 

elif 

else 

fi 

case 

in 

ouse 

out 

esac 


@ 

nil 

-< 

goto 

go 

skip 


}  {11.3} 


at 
is 
isnt 

of 


CII.3} 
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g)   Loop  symbols 

No  change  from  Report 

h)   Fragment  symbols 

symbol 

brief  comment  symbol 
bold  comment  symbol 
style  1  comment  symbol 
style  ii  comment  symbol 
bold  pragmat  symbol 
style  i  pragmat  symbol 

10.2.1  Environment  enquiries 


representation 

(ISO)  (...} 
comment 

CO 

# 
pragmat 

HI 


p)   int  max  abs  char  =  (iSO)  127   (EBCDIC)  255; 
q)  char  null  character  =  repr  0; 


r) 

char  flip  =  "t"; 

s) 

char  flop  =  "f"; 

t) 

char  errorchar  = 

u) 

char  blank  =  "  " 

v)   char  horizontal  tab  =  repr  ((EBCDIC)  5   (iSO)  9), 
backspace  =  repr  ((EBCDIC)  22    (iSO)  8), 
carriage  return  =  repr  15, 
line  feed  =  repr  ((EBCDIC)  57    (iSO)  10 ), 
vertical  tab  =  repr  11, 
form  feed  =  repr  12; 

10.6.1.   Library  Preludes 

a)  proc  complete  conv  =  (ref  book  b)  conv: 
(conv  c; 
for  i  from  0  to  max  abs  char  do 


c); 


(aleph  of  c)  (i)  :=  (repr  i,  repr  i)  od; 


{111.5} 


b)  proc  layout  encoded  conv  =  (ref  book  b)  conv: 

#  characters  to  be  ignored  are  set  to  null  # 

(conv  c; 

for  i  from  0  to_  abs  blank  -  1  do 

(aleph  of  c"5  {i)    :-    (null  character,  repr  i)  od; 
for  i  to  6  do 

char  ch  =  (i  in  horizontal  tab,  backspace, 

carriage  return,  line  feed,  vertical  tab, 
form  feed) ; 
(aleph  of  c)  (abs  ch)  :=  (c,  c)  od; 
for  i  from  abs  blank  to  max  abs  char  do 


c); 


(aleph  of  c )  (i)  :=  (repr  i,  repr  i)  od; 
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